skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.
Attention:The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 7:00 AM ET to 7:30 AM ET on Friday, April 24 due to maintenance. We apologize for the inconvenience.


Search for: All records

Creators/Authors contains: "Brejova, Brona"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Structural variants (SVs) are medium and large-scale genomic alterations that shape phenotypic diversity and disease risk. Numerous methods have been proposed for discovering SVs, however their benchmarking has been inconsistent across studies, often resulting in contradictory findings. One of the main sources of conflicting evaluation re-sults is the lack of consistency in the SV callsets used as ground truth, ranging from curated callsets released by consortia to more recent approaches that construct callsets from high-quality telomere-to-telomerede novohaplotype assemblies. The discrepancies between benchmarks are further compounded by the choice of the reference genome (GRCh37,GRCh38, andT2T-CHM13), where usingT2T-CHM13reveals a different deletion/insertion profile, indicating reduced reference bias. We evaluated the performance of several state-of-the-art SV discovery methods from long-read whole-genome sequencing data and observed substantial variation in their performance and rankings, depending on the choice of ground truth, reference genome, and genomic regions used for evaluation. Counter-intuitively, the more complete reference genomeT2T-CHM13does not inherently solve the problem of SV benchmarking; instead it reveals the limitations of each detection method in complex genomic regions. The substantial variation in detection accuracy across different genomic regions calls for additional caution in downstream analyses and in drawing conclusions based on predicted SVs. These findings underscore the complexity of evaluating SV detection methods and highlight the need for careful consideration and, ideally, field-standard best practices when reporting performance metrics. 
    more » « less
  2. Abstract Pangenomes are becoming increasingly popular data structures for genomics analyses due to their ability to compactly represent the genetic diversity within populations. Constructing a pangenome graph, however, is still a time-consuming and expensive process. A promising approach for pangenome construction consists of progressively augmenting a pangenome graph with additional high-quality assemblies. Currently, there is no method for augmenting a pangenome graph with unassembled reads from newly sequenced samples without first aligning the reads to a reference genome and performing variant calling and genotyping on the new individuals. In this work, we present the first assembly-free and mapping-free approach for augmenting an existing pangenome graph using unassembled long reads from an individual not already present in the pangenome. Our approach consists of finding sample specific sequences in reads using efficient indexes, clustering reads corresponding to the same novel variant(s), and then building a consensus sequence to be added to the pangenome graph for each variant separately. Using simulated reads based on Human Pangenome Reference Consortium (HPRC) assemblies, we demonstrate the effectiveness of the proposed approach for progressively augmenting the pangenome with long reads, without the need forde novoassembly or predicting genetic variants of the new sample. The software is freely available athttps://github.com/ldenti/palss. 
    more » « less